Weighted N-gram model for evaluating Machine Translation output
نویسنده
چکیده
I present the results of an experiment on extending an automatic method of Machine Translation evaluation (BLEU) with weights for the statistical significance of lexical items. I show that this extension gives additional information about evaluated texts; in particular it allows us to measure translation Adequacy, which, for statistical MT systems, is often overestimated by the baseline BLEU method. The proposed model also improves the stability of evaluation scores with a single human reference translation, which increases the usability of the proposed method for practical purposes. The model suggests linguistic a interpretation which develops deeper understanding of human intuitions about translation Adequacy and Fluency.
منابع مشابه
Evaluating Machine Translation Utility via Semantic Role Labels
We present the methodology that underlies new metrics for semantic machine translation evaluation that we are developing. Unlike widely-used lexical and n-gram based MT evaluation metrics, the aim of semantic MT evaluation is to measure the utility of translations. We discuss the design of empirical studies to evaluate the utility of machine translation output by assessing the accuracy for key ...
متن کاملMulti-source Neural Automatic Post-Editing: FBK's participation in the WMT 2017 APE shared task
Previous phrase-based approaches to Automatic Post-editing (APE) have shown that the dependency of MT errors from the source sentence can be exploited by jointly learning from source and target information. By integrating this notion in a neural approach to the problem, we present the multi-source neural machine translation (NMT) system submitted by FBK to the WMT 2017 APE shared task. Our syst...
متن کاملOnline Language Model adaptation via N-gram Mixtures for Statistical Machine Translation
The problem of language model adaptation in statistical machine translation is considered. A mixture of language models is employed, which is obtained by clustering the bilingual training data. Unsupervised clustering is guided by either the development or the test set. Different mixture weight estimation schemes are proposed and compared, at the level of either single or all source sentences. ...
متن کاملStatistical Machine Translation of Parliamentary Proceedings Using Morpho-Syntactic Knowledge
This paper presents an overview of the University of Washington statistical machine translation system developed for the 2006 TCSTAR evaluation campaign. We use a statistical phrase-based system with multiple decoding passes and a log-linear probability model. Our main focus was on exploring the possibility of using morpho-syntactic knowledge (lemmas and part-of-speech tags) for word alignment,...
متن کاملSemantic vs. Syntactic vs. N-gram Structure for Machine Translation Evaluation
We present results of an empirical study on evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to complete the semantic role annotation templates. Unlike the widely-used lexical and n-gram based or syntactic based MT evaluation metrics which are fluencyoriented, our results show that using semantic role labels to evaluate the ut...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003